35 research outputs found
RDFC-GAN: RGB-Depth Fusion CycleGAN for Indoor Depth Completion
The raw depth image captured by indoor depth sensors usually has an extensive
range of missing depth values due to inherent limitations such as the inability
to perceive transparent objects and the limited distance range. The incomplete
depth map with missing values burdens many downstream vision tasks, and a
rising number of depth completion methods have been proposed to alleviate this
issue. While most existing methods can generate accurate dense depth maps from
sparse and uniformly sampled depth maps, they are not suitable for
complementing large contiguous regions of missing depth values, which is common
and critical in images captured in indoor environments. To overcome these
challenges, we design a novel two-branch end-to-end fusion network named
RDFC-GAN, which takes a pair of RGB and incomplete depth images as input to
predict a dense and completed depth map. The first branch employs an
encoder-decoder structure, by adhering to the Manhattan world assumption and
utilizing normal maps from RGB-D information as guidance, to regress the local
dense depth values from the raw depth map. In the other branch, we propose an
RGB-depth fusion CycleGAN to transfer the RGB image to the fine-grained
textured depth map. We adopt adaptive fusion modules named W-AdaIN to propagate
the features across the two branches, and we append a confidence fusion head to
fuse the two outputs of the branches for the final depth map. Extensive
experiments on NYU-Depth V2 and SUN RGB-D demonstrate that our proposed method
clearly improves the depth completion performance, especially in a more
realistic setting of indoor environments, with the help of our proposed pseudo
depth maps in training.Comment: Haowen Wang and Zhengping Che are with equal contributions. Under
review. An earlier version has been accepted by CVPR 2022 (arXiv:2203.10856
Interpretable and Steerable Sequence Learning via Prototypes
One of the major challenges in machine learning nowadays is to provide
predictions with not only high accuracy but also user-friendly explanations.
Although in recent years we have witnessed increasingly popular use of deep
neural networks for sequence modeling, it is still challenging to explain the
rationales behind the model outputs, which is essential for building trust and
supporting the domain experts to validate, critique and refine the model. We
propose ProSeNet, an interpretable and steerable deep sequence model with
natural explanations derived from case-based reasoning. The prediction is
obtained by comparing the inputs to a few prototypes, which are exemplar cases
in the problem domain. For better interpretability, we define several criteria
for constructing the prototypes, including simplicity, diversity, and sparsity
and propose the learning objective and the optimization procedure. ProSeNet
also provides a user-friendly approach to model steering: domain experts
without any knowledge on the underlying model or parameters can easily
incorporate their intuition and experience by manually refining the prototypes.
We conduct experiments on a wide range of real-world applications, including
predictive diagnostics for automobiles, ECG, and protein sequence
classification and sentiment analysis on texts. The result shows that ProSeNet
can achieve accuracy on par with state-of-the-art deep learning models. We also
evaluate the interpretability of the results with concrete case studies.
Finally, through user study on Amazon Mechanical Turk (MTurk), we demonstrate
that the model selects high-quality prototypes which align well with human
knowledge and can be interactively refined for better interpretability without
loss of performance.Comment: Accepted as a full paper at KDD 2019 on May 8, 201